PyDigger - unearthing stuff about Python


NameVersionSummarydate
semantic-text-splitter 0.25.0 Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python. 2025-03-22 06:57:25
rs-bpe 0.1.0 A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust 2025-03-19 05:58:24
ts-tokenizer 0.1.19 TS Tokenizer is a hybrid (lexicon-based and rule-based) tokenizer designed specifically for tokenizing Turkish texts. 2025-01-30 19:59:44
UniTok 4.3.6 Unified Tokenizer 2025-01-30 14:28:25
pinyintokenizer 0.0.3 Pinyin Tokenizer, chinese pinyin tokenizer 2025-01-28 09:07:39
pgn-tokenizer 0.1.3 A byte pair encoding tokenizer for chess portable game notation (PGN) 2025-01-25 15:02:55
count-tokens 0.7.2 Count number of tokens in the text file using toktoken tokenizer from OpenAI. 2025-01-09 05:15:28
tokenlens 0.1.6 A library for accurate token counting and limit validation across various LLM providers 2025-01-05 03:40:24
token-vision 0.1.0 A fast, offline token calculator for images with various AI models (Claude, GPT-4V, Gemini) 2025-01-02 19:35:06
optilearn 1.3.6 Use to train neural networks, A Package for optimize models, transfer or copy files from one directory to other, use for nlp short word treatment, choosing optimal data for ML models, use for Image Scraping , use in timeseries problem to split the data into train and test, Deal with emojis and emoticons in nlp, word tokenize, token, get the list of Punctuation marks and English Pronouns too, can be used to read text files 2024-11-28 07:34:30
tokenizers 0.21.0 None 2024-11-27 13:11:23
midi-neural-processor 1.0.3 Tokenize MIDI files for neural network processing 2024-11-25 08:39:21
nlpo3 1.3.1 Python binding for nlpO3 Thai language processing library in Rust 2024-11-11 21:59:26
dir2text 1.0.1 A Python library and command-line tool for expressing directory structures and file contents in formats suitable for Large Language Models (LLMs). It combines directory tree visualization with file contents in a memory-efficient, streaming format. 2024-10-24 14:36:13
jieba3 1.0.2 “结巴 3”中文分词:做最好的 Modern Python 3 中文分词组件 2024-10-12 06:09:52
code-splitter 0.1.5 Split code into semantic chunks using tree-sitter 2024-09-23 06:10:18
taibun 1.1.7 Taiwanese Hokkien Transliterator and Tokeniser 2024-08-31 20:25:01
thongna 0.2.4 Blazing-fast Thai text processing library powered by Rust 2024-08-14 09:29:32
simplemma 1.1.1 A lightweight toolkit for multilingual lemmatization and language detection. 2024-08-08 12:20:45
bleuscore 0.1.3 A fast bleu score calculator 2024-05-27 03:28:36
hourdayweektotal
5919747031294793
Elapsed time: 1.53102s